Search CORE

258 research outputs found

Risque et TAL : détection, prévention, gestion. Introduction au 1 er atelier

Author: Grabar Natalia
Tanguy Ludovic
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audienceThis article is the introduction to the first workshop dedicated to Risk and NLP, addressing theuse of natural language processing methods for the detection, prevention and management of risk.The papers presented during the workshop come from both academic and industrial actors. Theycover the most risk-prone domain such as biomedicine (medicine and pharmacology), chemistryand transportation, but also address more transversal issues of human activity such as professionalenvironments and technical documentation and requirements. The works presented also show thevariety of the processed data (intervention reports, social network communications, academic papers,surveys, technical documentation), the objectives of the analyses (extraction of information relatedto the risk, ambiguity control, documentation checking), and of technical solutions (data collection,corpus analysis, resources development).Nous présentons ici le premier atelier Risque et TAL portant sur les méthodes de traitement automa-tiques des langues pour la détection, la prévention et la gestion des risques. Les travaux présentés dans le cadre de cet atelier sont issus de travaux académiques mais aussi d'applications développées par des acteurs industriels. Ils couvrent les principaux domaines pour lesquels la notion de risque est au centre de préoccupations de par l'ampleur des conséquences à éviter : biomédical (médecine et pharmacologie), chimie et transports, mais abordent aussi des aspects plus transversaux de l'activité humaine, comme les environnements professionnels et les spécifications. Ces différents travaux montrent à la fois la diversité des données visées (retours d'expérience, réseaux sociaux, publications scientifiques, enquêtes, documentation technique), les objectifs des analyses (extraire de l'information liée aux risques, contrôler ou vérifier les ambiguïtés) et les solutions techniques (recueil de données, analyse de corpus, développement de ressources)

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Repérage de relations sémantiques entre termes : sur la piste de la morphologie dérivationnelle

Author: Grabar Natalia
Hamon Thierry
Publication venue: Presses Universitaires de Grenoble
Publication date: 05/05/2004
Field of study

International audienceNotre travail est consacré au repérage de relations sémantiques entre termes. Dans ce contexte de constitution de terminologies structurées, nous nous intéressons en particulier à l'aide que peut apporter une approche basée sur la morphologie au regard d'autres techniques d'acquisition de relations sémantiques en corpus. Parmi les opérations dont dispose la morphologie, nous exploitons l'affixation et la composition. Nous portons également notre attention sur la supplétion des bases. Nous montrons quelques schémas interprétatifs qui se dégagent et indiquons les relations sémantiques qui sont aptes, alors, d'émerger

HAL Descartes

HAL-Paris 13

Parallel sentence retrieval from comparable corpora for biomedical text simplification

Author: Cardon Rémi
Grabar Natalia
Publication venue: HAL CCSD
Publication date: 02/09/2019
Field of study

International audienceParallel sentences provide semantically similar information which can vary on a given dimension , such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. Manually created reference data show 0.76 inter-annotator agreement. Our purpose is to state whether a given pair of specialized and simplified sentences is parallel and can be aligned or not. We treat this task as binary classification (alignment/non-alignment). We perform experiments with a controlled ratio of imbalance and on the highly unbalanced real data. Our results show that the method we present here can be used to automatically generate a corpus of parallel sentences from our comparable corpus

Detection and analysis of medical misbehavior in online forums

Author: Bigeard Elise
Grabar Natalia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2019
Field of study

International audienceSocial media is an important source of information on behaviour and habits of users. It has been used as such in public health research to monitor adverse drug effects and drug misuse among others. We propose to study drug non-compliance in health online forums. First, we use supervised classification to detect non-compliance messages and obtain 0.436 of F-measure. Then, we manually analyse the content of the messages to learn what kinds of behaviour can be detected, and to study the effect the social media can have on patient's compliance behaviour

Automatic detection of parallel sentences from comparable biomedical texts

Author: Cardon Rémi
Grabar Natalia
Publication venue: HAL CCSD
Publication date: 07/04/2019
Field of study

International audienceParallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. Our purpose is to state whether a given pair of specialized and simplified sentences is to be aligned or not. Manually created reference data show 0.76 inter-annotator agreement. We treat this task as binary classification (alignment/non-alignment). We perform experiments on balanced and imbalanced data. The results on balanced data reach up to 0.96 F-Measure. On imbalanced data, the results are lower but remain competitive when using classification models train on balanced data. Besides, among the three datasets exploited (se-mantic equivalence and inclusions), the detection of equivalence pairs is more efficient

...des conférences enfin disons des causeries... Détection automatique de segments en relation de paraphrase dans les reformulations de corpus oraux.

Author: Eshkol-Taravella Iris
Grabar Natalia
Publication venue: HAL CCSD
Publication date: 22/06/2015
Field of study

International audienceNotre travail porte sur la détection automatique des segments en relation de reformulation paraphrastique dans les corpus oraux. L'approche proposée est une approche syntagmatique qui tient compte des marqueurs de reformu-lation paraphrastique et des spécificités de l'oral. Les données de référence sont consensuelles. Une méthode automatique fondée sur l'apprentissage avec les CRF est proposée afin de détecter les segments paraphrasés. Différents descripteurs sont exploités dans une fenêtre de taille variable. Les tests effectués montrent que les segments en relation de paraphrase sont assez difficiles à détecter, surtout avec leurs frontières correctes. Les meilleures moyennes atteignent 0,65 de F-mesure, 0,75 de précision et 0,63 de rappel. Nous avons plusieurs perspectives à ce travail pour améliorer la détection des segments en relation de paraphrase et pour étudier les données depuis d'autres points de vue. Abstract. Our work addresses automatic detection of segments with paraphrastic rephrasing relation in spoken corpus. The proposed approach is syntagmatic. It is based on paraphrastic rephrasing markers and the specificities of the spoken language. The reference data used are consensual. Automatic method based on machine learning using CRFs is proposed in order to detect the segments that are paraphrased. Different descriptors are exploited within a window with various sizes. The tests performed indicate that the segments that are in paraphrastic relation are quite difficult to detect. Our best average reaches up to 0.65 F-measure, 0.75 precision, and 0.63 recall. We have several perspectives to this work for improving the detection of segments that are in paraphrastic relation and for studying the data from other points of view

HAL Université de Tours

Detection and analysis of drug non-compliance in internet fora using information retrieval approaches

Author: Bigeard Lise
Grabar Natalia
Thiessard Frantz
Publication venue: HAL CCSD
Publication date: 07/04/2019
Field of study

International audienceIn the health-related field, drug non-compliance situations happen when patients do not follow their prescriptions and do actions which lead to potentially harmful situations. Although such situations are dangerous, patients usually do not report them to their physicians. Hence, it is necessary to study other sources of information. We propose to study online health fora with information retrieval methods in order to identify messages that contain drug non-compliance information. Exploitation of information retrieval methods permits to detect non-compliance messages with up to 0.529 F-measure, compared to 0.824 F-measure reached with supervized machine learning methods. For some fine-grained categories and on new data, it shows up to 0.70 Precision

Simplification-induced transformations: typology and some characteristics

Author: Cardon Rémi
Grabar Natalia
Koptient Anaïs
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

International audienceThe purpose of automatic text simplification is to transform technical or difficult to understand texts into a more friendly version. The semantics must be preserved during this transformation. Automatic text simplification can be done at different levels (lexical, syntactic, semantic, stylistic...) and relies on the corresponding knowledge and resources (lexicon, rules...). Our objective is to propose methods and material for the creation of transformation rules from a small set of parallel sentences differentiated by their technicity. We also propose a typology of transformations and quantify them. We work with French-language data related to the medical domain, although we assume that the method can be exploited on texts in any language and from any domain

Crossref

HAL Descartes

Speculation and negation detection in french biomedical corpora

Author: Claveau Vincent
Dalloux Clément
Grabar Natalia
Publication venue: HAL CCSD
Publication date: 02/09/2019
Field of study

International audienceIn this work, we propose to address the detection of negation and speculation, and of their scope, in French biomedical documents. It has been indeed observed that they play an important role and provide crucial clues for other NLP applications. Our methods are based on CRFs and BiLSTM. We reach up to 97.21 % and 91.30 % F-measure for the detection of negation and speculation cues, respectively , using CRFs. For the computing of scope, we reach up to 90.81 % and 86.73 % F-measure on negation and speculation , respectively, using BiLSTM-CRF fed with word embeddings

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recherche d'information médicale pour le patient Impact de ressources terminologiques

Author: Claveau Vincent
Grabar Natalia
Hamon Thierry
Le Maguer Sébastien
Publication venue: HAL CCSD
Publication date: 18/03/2015
Field of study

National audienceABSTRACT. The right of patients to access their clinical health record is granted by the code of Santé Publique. Yet, this content remain difficult to understand. We propose an experience, in which we use queries defined by patients in order to find relevant documents. We utilise the Indri search engine, based on statistical language modeling and semantic resources. We stress the point related to the terminological variation (e.g. synonyms, abbreviations) to make the link between expert and patient languages. Various combinations of resources and Indri settings are explored, mostly based on query expansion. Our system shows up to 0.7660 P@10 and up to 0.6793 [email protected]ÉSUMÉ. Le droit d'accès au dossier clinique par les patients est inscrit dans le code de Santé Publique. Cependant, ce contenu reste difficile à comprendre. Nous proposons une expérience, où les requêtes des patients sont utilisées pour retrouver les documents pertinents. Nous util-isons le moteur de recherche Indri, basé sur le modèle statistique de la langue, et des ressources sémantiques. L'accent est mis sur la variation terminologique (e.g. synonymes, abréviations) pour faire le lien entre la langue des experts et des patients. Différentes combinaisons de ressources et du paramétrage de Indri sont testées, essentiellement à travers l'expansion des requêtes. Notre système montre jusqu'à 0,7660 de P@10 et 0,6793 de NDCG@10

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Paris 13

Hal-Diderot

HAL-Rennes 1